Self-organization of collaboration networks 
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We study collaboration networks in terms of evolving, self-organizing bipartite graph models. 
We propose a model of a growing network, which combines preferential edge attachment with the 
bipartite structure, generic for collaboration networks. The model depends exclusively on basic 
properties of the network, such as the total number of collaborators and acts of collaboration, the 
mean size of collaborations, etc. The simplest model defined within this framework already allows us 
to describe many of the main topological characteristics (degree distribution, clustering coefficient, 
etc.) of one-mode projections of several real collaboration networks, without parameter fitting. We 
explain the observed dependence of the local clustering on degree and the degree-degree correlations 
in terms of the "aging" of collaborators and their physical impossibility to participate in an unlimited 
number of collaborations. 
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I. INTRODUCTION 

Recent years have witnessed an upsurge in the study of 
complex systems that can be described in terms of net- 
works, in which the vertices picture the elementary units 
composing the system, and the edges represent the inter- 
actions or relations between pairs of units 0, Q . These 
studies have led to the development of a modern theory of 
complex networks which has found fruitful applications 
in fields as diverse as the Internet 0, the World-Wide 
Web 4] , or biological interacting networks [E IS 0> HI • 

An important example of this kind of systems, that 
has attracted a great deal of interest from researchers 
in different scientific fields, are social networks |9j]. The 
study of social networks has been traditionally hindered 
by the small size of the networks considered and the dif- 
ficulties in the process of data collection (usually from 
questionnaires or interviews). More recently, however, 
the increasing availability of large digital databases has 
allowed to study a particular class of social networks, the 
so-called collaboration networks. These networks can be 
defined in a non-ambiguous way, and their exceptionally 
large size has permitted empirical researchers to obtain 
a reliable statistical description of their topological prop- 
erties and to arrive at solid conclusions concerning their 
structure. 

Social collaboration networks are generally defined in 
terms of a set of people (called actors in the social sci- 
ence literature), and a set of collaboration acts. Actors 
relate to each other by the fact of having participated in a 



common collaboration act. Examples of this kind of net- 
works can be found in movie actors related by co-starring 
the same movie, scientist related by co-authoring a scien- 
tific paper, members of the boards of company directors 
related by sitting on the same board, etc. Collaboration 
networks can be represented as bipartite graphs with 
two types of vertices, one kind representing the actors, 
while vertices of the other kind are acts of collaboration. 
As a rule, however, it is the one- mode projections of these 
bipartite graphs that are empirically studied. In these 
projections, the vertices representing the acts of collab- 
oration are excluded, and collaborating pairs of actors 
are connected by edges. Since multiple connections in 
the projected graph are usually ignored, the projection 
is less informative than the original bipartite graph. 

The study of several examples of large collaboration 
networks 11, 12, 13lhJ,hJ allows one to draw a number 
of conclusions regarding the main topological properties 
of one-mode projections of these networks: 

1. The degree distribution P(k), defined as the proba- 
bility that a vertex is connected to k other vertices, 
often exhibits a fat tail, that can be approximated 
by a power law behavior for large k. 

2. The clustering coefficient, roughly defined as the 
probability that two neighbors of any given vertex 
are also neighbors of each other, takes in average 
large values, and it locally depends on the vertex 
degree, signaling the presence of a structure in the 
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3. The degrees of the nearest neighbor vertices are 
positively correlated, i.e., vertices with large degree 
have a high probability to be connected to vertices 



2 



with large degree, and vice-versa. This property 
has been dubbed assortative mixing [l8| . 

The general presence of these three properties in most 
collaboration graphs prompts toward the development 
of models capable to reproduce and explain these fea- 
tures. In general, the first insight into the architecture 
of a complex network is provided by "formal" construc- 
tions of random graphs. These constructions allow one to 
reproduce the structure of complex networks, but com- 
pletely ignore the mechanisms underlying these archi- 
tectures. The minimal formal model of a complex one- 
partite graph, that is a graph composed by a single typ e 
of vertices, is the configuration model 0, |2(], El E3- 
In simple terms, the configuration model generates (un- 
corrected) graphs, which are maximally random under 
the constraint that their degree distribution is a given 
one. Similarly, the minimal model of a complex bipartite 
graph is a bipartite network that is maximally random 
under the constraint that the two de gree distributions 
for both kinds of vertices are given |23l 12 1| . One can see 
that this is a direct generalization of the configuration 
model to bipartite graphs. The quality of the configu- 
ration model applied to bipartite graphs was checked in 
Ref. [23j. In this work, it was proved that the empirical 
degree distribution of the one-mode projection of a bi- 
partite collaboration graph agrees with that of the con- 
figuration model when the empirically observed degree 
distributions are imposed on the two kinds of vertices. 
One should emphasize that a one-mode projection of an 
uncorrelated bipartite graph is correlated. In particu- 
lar, this projection contains numerous triangles of edges 
which results in a high clustering plf. 

Therefore, it might seem at first sight that, in order 
to explain the nature of the structure of collaboration 
networks, it is sufficient (i) to propose a mechanism gen- 
erating the specific degree distributions of the two kinds 
of vertices and afterwards (ii) to connect vertices by using 
the configuration model. This approach, however, fails to 
reproduce the complex distribution of connections over 
collaboration networks, since assumes pure randomness. 
Also, it does not explain specific distributions of vertex 
degrees, which, in the configuration model, are assumed 
to be given. Note that, while providing reasonable values 
of clustering, the configuration model fails to reproduce 
the type of degree-degree correlations in collaboration 
networks. Consequently, in order to fully explain the 
specific architecture of collaboration networks (fat-tailed 
degree distributions, high clustering, assortative mixing, 
etc.), we have to introduce a mechanism for the linking 
of vertices in these networks. 

In the present paper we propose a first approxima- 
tion to such a mechanism. In our approach, we treat 
collaboration networks as growing, self-organizing, cor- 
related bipartite graphs, applying the ideas at the basis 
of the preferential attachment concept put forward by 
Barabasi and Albert j2(J in the network modeling con- 
text (see also Ref. [27j) to bipartite graphs. The simplest 
model that we can define already allows us to quantita- 



tively describe most of the empirical data on collabora- 
tion networks without fitting, only by using basic num- 
bers characterizing the real networks. We emphasize that 
the absence of fitting convincingly proves the validity of 
the concept. 

The degree-degree correlations in the one-mode pro- 
jections of collaboration graphs are a topic of our special 
interest. We show that the "assortative mixing" charac- 
ter of these correlations are not so inevitable in collabo- 
ration networks, as it is usually believed (2j|. We explain 
the origin of the assortative mixing in real collaboration 
networks in terms of the aging of actors, which cannot 
accept new connections during the whole growth process 
of the network. 

The present paper is organized as follows. In Scc.HTIwc 
review measurements defined to characterize the topo- 
logical properties of collaboration networks — bipartite 
graphs and their one-mode projections. Sec. IIHI presents 
the existing empirical data on collaboration networks, re- 
ferring in particular to the networks of movie actors, sci- 
entific coauthorship, and company directors. In Sec. IIVI 
we introduce a simple model of a growing, self-organizing 
bipartite graph. Sec. W\ contains results obtained for this 
model and a detailed comparison with empirical data. 
Separately, in Sec. IVII we discuss and explain the pres- 
ence of positive correlations between the degrees of the 
nearest-neighbor vertices in collaboration networks. In 
this Section we discuss the importance of the "physical" 
limitation of vertex degrees in collaboration networks. 
Finally, in Sec. IVlll we draw the main conclusions of our 
work. 



II. STRUCTURAL ORGANIZATION OF 
COLLABORATION NETWORKS 

As we have already mentioned in the Introduction, 
collaboration networks can be represented as bipartite 
graphs 28]. On one side, we have collaboration acts (e.g. 
movie co-starring or paper co-authorship, belonging to 
the same company, school, etc.), that may be represented 
as a special kind of vertices. On the other side we have 
the actors (normal vertices), that are linked to the col- 
laborations acts in which they participate. Two indepen- 
dent degree distributions may then be defined: First, the 
probability S(n) of having n actors participating in any 
collaboration act; and second, the probability Q(q) that 
any actor has taken part in q collaboration acts. 

In most of cases, however, the object of study is not the 
whole bipartite graph but its one-mode projection: i.e., 
the network formed by the collaborating actors linked 
to each other whenever they have shared a collaboration 
act. For this projected network, another degree distribu- 
tion P(k) may be considered, defined as the probability 
that any given actor is connected to k others. Focusing 
on the one-mode projection of a collaboration network, 
many other properties generally studied in common ran- 
dom graphs can be measured. This type of study has 
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already been carried out for several empirical social net- 
works (see Ref. 29] for a recent review). The quantities 
that we use to describe the structure of the projected 
network are the clustering coefficient and the mean clus- 
tering ^3 > t he average clustering coefficient of vertices 
of degree k [IS, EIH IS EH- the average degree of the 
nearest neighbors of the vertices of degree k |34|. and the 
Pearson correlation coefficient defined in Refs. |18ll35|. 

The local clustering Cj of the vertex i is given by the 
rate between the number of triangles connected to that 
vertex, Sj , and the total number of possible triangles in- 
cluding it, ki (ki — l)/2, i.e. 



2s, 



ki{hi 1) 



(i) 



To obtain the mean degree-dependent local clustering we 
average the local clustering over all vertices with degree 
A: in a network, 



c(k) 



s(k) 



k(k - l)/2 ' 



(2) 



where s(k) — (si(k)) is the mean number of connections 
between the nearest neighbors of a vertex of degree k. 
The mean clustering (c) is defined as the average of the 
local clustering over all the vertices in a network, i.e. 



(3) 



fe>i 



where N is the total number of actors (vertices), and the 
second sum runs over the N vertices of the network. The 
clustering coefficient of a graph (transitivity in sociology 
0) is defined as 



3 x number of triangles of edges in a graph 
number of connected triples of vertices 

2j2 k P(k)s(k) 



(4) 



The quantities c(fc), (c) and c provide information on the 
concentration of loops of length three in a graph, which 
is typically high in social networks j3||. Note that if the 
local clustering depends on the degree, (c), and the 
(relative) difference is great in many real-world networks. 

The correlations between the degrees of connected ver- 
tices can be fully defined by means of the joint probability 
P(k, k'), defined such that (2 — 5k,k')P(k, k') is the prob- 
ability that a randomly chosen edge connects to vertices 
of degree k and k' 01 ■ (^fc,fc' is the Kronecker symbol.) 
By using this quantity one can compute the average de- 
gree of the nearest neighbors of the vertices of degree k, 
k nn {k), defined as 



k n n{k) — (k) 



E k > k'P(k,k') 
kP(k) 



Y,k'P{k'\k). (5) 
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FIG. 1: a) Probability distribution of the size of collaboration 
acts S(n) for the movie actors collaboration network (main 
plot) and the scientific collaboration network (inset), b) Prob- 
ability distribution that an actor has taken part in q collab- 
oration acts Q(q) for the movie actors collaboration network 
(main plot) and the scientific collaboration network (inset). 
The solid line has slope ~ 2. 



In simple terms, if the network presents assortative mix- 
ing (large degree vertices connect preferably with large 
degree vertices, and vice- versa), k nn {k) increases with k 
[38| . In the case of disassortative mixing (large degree 
vertices connected with low degree vertices, and vice- 
versa), k nn (k) is conversely a decreasing function of k. 
Analogous information can be obtained by means of the 
Pearson correlation coefficient, defined as 



(k) 



E k k 2 ~knn(k)P(k)-(k 2 ) 2 
(fc)(fc 3 ) - (fc 2 ) 2 



(6) 



where P(k'\k) is the conventional probability that a ver- 
tex of degree k is connected to a vertex of degree k! . 



Here positive (negative) values of r imply the presence of 
assortative (disassortative) mixing. 
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FIG. 2: Degree distribution P(k) of the one-mode projection 
for the movie actors collaboration network (main plot) and 
the scientific collaboration network (inset). The full line has 
slope ~ 2. 



III. EMPIRICAL DATA ON COLLABORATION 
NETWORKS 

In the present Section we revisit the empirical analysis 
of three typical social collaboration networks. We con- 
sider in particular the network formed by movie actors 
playing in the same movie, the network of scientific col- 
laborations, and the network of company board directors 
sitting on the same board. 



A. Movie actor collaboration network 

The movie actor collaboration network that we con- 
sider was obtained from the Internet Movie Database 
(IMDB) j3^. Taking only into account movies with 
more than one actor, and discarding duplicated actors 
in several movies, we finally analyze the properties of 
a network composed by N = 382219 actors acting on 
t = 118477 films. The distribution of movie cast size, 
S(n), is represented in Fig. ^,). Apparently, this func- 
tion follows an exponential decay, with an average cast 
size of n = 12.33 actors per movie. The distribution Q(q) 
(number of movies in which an actor has played) adjusts 
better to a power law decay Q(q) ~ q^ 1 with an appar- 
ent exponent 7 « 2, see Fig. QJd). An upper cutoff of 
this dependence is observed around q c ~ 100. The mean 
number of movies played per actor is (q) — 3.82. 

The degree distribution of the one-mode projection of 
this network, P(k), is plotted in Fig.|3 It has a power law 
decay with approximately the same exponent as Q(q), 
which extends for close to two decades up to a sharp 
cutoff at k c ~ 2000. The mean degree of the network 
is (k) = 78.69. The local clustering as a function of 
degree is depicted in Fig-El We can observe a flat region, 
extending up to degree values close to 10 2 , followed by 
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FIG. 3: Local clustering as a function of the degree c(k) for 
the movie actors collaboration network (main plot) and the 
scientific collaboration network (inset). 
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FIG. 4: Average degree of the nearest neighbors as a func- 
tion of the degree k n n(k) for the movie actors collaboration 
network (main plot) and the scientific collaboration network 
(inset). 

a rapid decrease. The mean clustering of the one-mode 
projection is (c) = 0.78 and the clustering coefficient is 
c = 0.17. The correlations in the projected network, 
presented in the form of the average degree of the nearest 
neighbors of a vertex versus its degree, are plotted in 
Fig. ^] The increasing behavior of the function k nn (k) 
is compatible with the presence of assortative mixing, a 
fact that is further confirmed by the value of the Pearson 
coefficient, r = 0.23. In Table [Q we summarize the main 
average values obtained for this network. 



B. Scientific collaboration networks 

The next collaboration network that we analyze is the 
network of scientific collaborations collected from the 
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condensed matter preprint database at Los Alamos |40j . 
In this collaboration graph, the actors represent scien- 
tists which have collaborated in the writing of a scientific 
paper. The complete bipartite network is composed by 
t = 17828 papers and N = 16258 authors. The distribu- 
tion of the number of authors in a given paper is plot- 
ted in Fig.^,). This distribution is clearly exponential, 
with an average value n = 3.05. The distribution of the 
number of paper written by any given author, Fig.^i), 
shows, on the other hand, an apparent power law behav- 
ior, even though the limited range that it takes (scarcely 
more than one decade) precludes the determination of 
a significant exponent. The average number of papers 
written by any author is in this case (q) — 3.35. 

The degree distribution of the one-mode projection of 
the scientific collaboration network, plotted in Fig. |3 
shows again a fat-tailed behavior, compatible with a 
power law. The corresponding average degree is (k) = 
5.85. The degree-dependent local clustering c(k) and the 
average degree of the nearest neighbors are explored in 
Figs.Eland^] respectively. This last result, together with 
a Pearson r — 0.31, indicates the presence of a strong as- 
sortative mixing. Additional numerical parameters char- 
acterizing this network are summarized in Tabled 

C. Board of directorships 

The last collaboration network that we report is the 
network of company directors, in which two directors are 
linked if they sit on the same board of directors. Table |I] 
reports the data corresponding to the list of the "Fortune 
1000" US companies, obtained from Refs. [M El- It 
includes t = 914 companies and N = 7673 directors. 
The average number of directors per company is n — 
11.5. Both distributions Q(q) and P(k) can be adjusted 
by exponential decaying functions, although the range of 
values for q and k is quite restricted. The mean degree 
of the projected network is (k) = 14.44. The clustering 
coefficient of the one-mode projected network is quite 
large, and it shows a clear assortative mixing behavior, 
as given by a Pearson correlation coefficient r = 0.28. 



IV. SELF-ORGANIZED COLLABORATION 
MODEL 

A. Definition of the model 

To understand the common properties of collaboration 
networks, we propose a self-organized growing model. We 
exploit two generic features of collaboration networks: (i) 
Social collaboration networks are organized as bipartite 
graphs, (ii) Social collaboration networks are not static 
entities, but they grow in time by the continuous addi- 
tion of new acts of collaboration (movies produced or 
papers written), and new actors, that increase the pool 
of possible participants in new acts of collaboration. 



Using the language of movies to make the description 
more concrete, our growing bipartite network model is 
defined by the following rules: 

1. At each time step a new movie with n actors is 
added. 

2. Of the n actors playing in a new movie, m actors 
are new, without previous experience. 

3. The rest n — m actors are chosen from the pool of 
"old" actors with a probability proportional to the 
number q of movies that they previously starred. 

The total number of movies is t, the "time" . The number 
n may be either constant or a random variable distributed 
with a given distribution S(n). The number m may also 
be either constant or a random variable. At each time 
step, the total number of actors increases as N — > N+m. 
Thus, the model generates a bipartite graph of t movie 
vertices and N actor vertices. Note that the proportional 
preference corresponds to the following practical rule of 
selection of actors: A director randomly selects a previous 
movie and then chooses at random one of its actors. 



B. Analytical results 

One can see that the evolution rules of the present 
model practically coincide with those of the Simon model 
|27| . For simplicity, let us assume that the number of ac- 
tors playing in each movie is constant and equal to its 
average value, n = n, as well as the number of new ac- 
tors per movie, m = rh. This assumption is in fact quite 
reasonable given the exponential nature of the S(n) dis- 
tributions observed empirically. We also assume that if 
the total number of actors is large, the probability that 
two actors selected for a new movie have already co- 
starred in other old film is vanishingly small. Note that, 
strictly speaking, this assumption is only valid for uncor- 
rected networks with rapidly decreasing degree distribu- 
tions. Nevertheless, the results obtained with it provide a 
good enough approximation to the empirical values (see 
Table P) to justify its introduction. 

Within this approximation, since each movie starred 
by an actor leads to the acquisition of n — 1 new co- 
actors, we have a strict relation between the experience 
of an actor, q, and the total number of its co-actors (its 
degree in the projected network), k: 

k = q(n-T). (7) 

In particular, at large k and q, when we can consider 
both variables to be continuous, we have 

P(fc) S * q( * V ( 8 ) 
n — l \n — 1/ 

In the limit of large N, the total number of edges in 
the one-mode projected graph (the number of pairwise 
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coactorships) is t times the number of pairs of actors in 
a new film, that is, tn(n — l)/2, while the total number 
of actors is N — t fh. Thus, the mean degree of the one- 
mode projection network is 



Therefore, 



(k) 



(q) = 



77(77 — 1) 



m 



(fc) 
i-l 



n 
fh 



(9) 



(10) 



As in the Simon model, the connections of this grow- 
ing bipartite graph self-organize into a scale-free struc- 
ture. Quite similarly to standard derivations for the Si- 
mon model, in the large network limit, the distribution 
takes the form HIH 



(11) 



Q(q) = ( 7 -l)B( g , 7 ), 
where B{ , ) is the /3-function and 

m 

7 = 2 + -• 



(12) 



For g>l, the asymptotics of Q(q) is 

Q(«)~(g + 7-l/2)^, 

so that 7 is the exponent of the degree distribution, 
the one-mode projection, this corresponds to 

P(fe)~[fe+(7-l/2)(n-l)]-T. 



(13) 



In 



(14) 



That is, the projected degree distribution exhibits a 
power law behavior, with an off-set k = (7 — 1/2) (n— 1). 
The presence of this off-set, which may be large for large 
values of n, can hinder the direct evaluation of the expo- 
nent 7. Therefore, it is more appropriate to compare the 
degree distribution with the general expression Eq. (|14f> . 

To calculate the clustering coefficient, we need to recall 
our second assumption: If we consider a particular actor, 
i, who has played in q movies, in the thermodynamic 
limit none of his co-actors repeats twice in different films. 
This means that in the projected network the triangles 
attached to a vertex i can only relate his co-actors inside 
each separate movie. The number of such triangles is 
q(h — 1) (n — 2)/2, while the total number of possible 
triangles attached to i is k (k— 1)/2 or, equivalently, q (n— 
1) \q (h — 1) — l]/2. The local clustering as a function of 
the experience of an actor is then given by 



h-2 



q{n-l)-V 
which, as a function of k, transforms into 

c(k) 



k - 1 



(15) 



(16) 



Then using the definition (6) readily yields the average 
clustering 



(c) 



J2P(k)c(k) = (n-2)^^\ 

k>l k>l 



Q(q) 



*-iq-l/(fh - 1) 

q>0 H 1 K ' 



(17) 



On the other hand, to compute the clustering coefficient 
c, defined in Eq. we need to estimate the number of 
triangles attached to a vertex of degree k. As we have 
seen before, this number is q (n— 1) (n— 2)/2, or, in terms 
of k, s(k) = k(n - 2) j2. Therefore, 

J2 q P(q)q(n-l)(n-2)/2 



Z q P(q)q(n-l)[q(n-l)-l]/2 

(n-2)(q) _ (n-2)(k) 
(n-l)(?)-(q) (fc 2 )-(fc)' 



(18) 



One can see that the average clustering (c) converges 
to a finite value for any degree distribution, since the 
region of low degrees makes the main contribution. Con- 
sequently, Eq. (|17l) works well even if the degree distribu- 
tion is fat-tailed. The clustering coefficient, on the other 
hand, approaches zero if the second moment of the de- 
gree distribution diverges. This divergence takes place 
for 7 < 3 in the thermodynamic limit (N,t — ► 00). In 
this case, c crucially depends on the degree cutoff k c (or 

q c ) in the form c ~ k c ^ ■ Note that formula l|18|) may 
underestimate the value of the clustering coefficient if the 
degree distribution is fat-tailed and k c is large. 



V. RESULTS AND COMPARISON WITH REAL 
NETWORKS 

To check the validity of our model, we proceed to com- 
pare the empirical data on collaboration networks with 
the predictions made in the previous Section, as well as 
with numerical simulations of the model. The analytic 
predictions are specified in terms of two parameters, the 
average number of actors per collaboration act h and 
the average number of new actors fh. If all these actors 
are recruited at a constant rate, then we have in aver- 
age fh = N/t new actors per collaboration act. From 
these two parameters, using the results of the previous 
Section, we can compute our predictions for all the prop- 
erties of the networks described in Section IIIII When 
performing numerical simulations of the model, and in 
order to avoid discreteness, we use randomly distributed 
to and n. Their distributions are taken to be exponential 
with averages 777 and 77 respectively. This functional form 
corresponds to that of the distribution S(n) empirically 
observed for actor and scientific collaboration networks 
(see Fig. ^,) . For the company directorship network on 
the other hand, we do not count with an empirical form 
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FIG. 5: Comparison among the degree distribution P(k) for 
the empirical movie actors collaboration network (circles), for 
the theoretical prediction of section IV (dot-dashed line), and 
for simulations of the original model (dashed curve) and the 
version with aging (solid line). 



not have an infinite number of co-stars, and (ii) finite size 
effects restrict the degrees of vertices (see, e.g., Ref. H3). 

The empirical local clustering and its analogue ob- 
tained by numerical simulations (Fig. [SJl demonstrate a 
more complex dependence on degree that the simple es- 
timate of Eq. (|16fl . From Fig. we see that the function 
c(k) computed from the model follows a slower decay 
than the corresponding empirical function for the movie 
actors network. The results for the mean degree of the 
nearest neighbors of a vertex as a function of its degree, 
k nn {k), are represented in Fig. Unexpectedly, apart 
from a small region for very small values of k, k n n(k) 
decreases with degree, and the Pearson correlation co- 
efficient is negative (see Table QJ. So that, unlike real- 
world collaboration graphs, networks with a fat-tailed 
degree distribution, generated by the simplest version of 
our model, show disassortative mixing. On the contrary, 
in the case of directorship networks, the model provides 
positive values for the Pearson coefficient r, in agreement 
with the empirical results, though a little lower (see dis- 
cussion in the next Section). 



of S(n). Hence, we checked both exponential and Pois- 
son distributions. The global characteristics of the net- 
works generated with this last distribution suit better 
their empirical counterparts. The results of the compar- 
ison between empirical data, theoretical predictions and 
simulations are summarized in Tabled 

We observe an agreement between the model predic- 
tions and the empirical results for the mean clustering 
(c). Note some deviations in the mean degree (k) of one- 
mode projected networks. These discrepancies are due 
to the fact that in our analysis we neglect the probability 
that some actors for a new film have previously co-starred 
in the same movies. For the net of codirectorships, the 
computed clustering coefficient c is in a reasonable agree- 
ment with the empirical value. In the other networks, 
however, the calculated values of c are severely underes- 
timated. The reason for this is the poor quality of the 
approximate formula (|I8|I for this model in the case of a 
fat-tailed degree distribution (see discussion in Sec. II VI) . 

The exponents of the projected degree distribution are 
7 = 2.35, 7 = 2.43 and 7 = 4.7 for the movie, coauthor- 
ship and codirectorship networks, respectively. The ex- 
ponent larger than 3 in this last case is compatible with 
the exponential decay observed empirically. The range 
of the empirical degree distribution in the coauthorship 
network is too small to compare with the asymptotic ex- 
pression Eq. I|I4|) . So, we make this comparison only in 
the case of the movie actors network, Fig. As can be 
seen, the agreement between the theoretical and empir- 
ical distributions is notorious. Only at very small and 
very large values of the degree a certain discrepancy can 
be noticed essentially due to the continuous degree ap- 
proximation employed and the presence of a cutoff in the 
empirical distribution, respectively. This upper cutoff is 
inevitable due to two factors: (i) an actor physically can- 



VI. SELF-ORGANIZED MODEL WITH AGING 

At least in one aspect, the model presented above is 
a serious oversimplification of the mechanism underly- 
ing the growth of social collaboration networks. It al- 
lows an analytical treatment but leads to problems in the 
comparison with the clustering coefficient and, especially, 
degree-degree correlations of empirical networks. The 
most important missing point is probably the aging of 
individual agents. This issue is evident in the case of the 
movie actors network (although it can be observed for the 




j-4 I I I I I 

10° 10' 10 2 10 1 10 4 



id id id 10 id 
k 

FIG. 6: Comparison among the clustering coefficient as a 
function of the degree c(k) for the empirical movie actors col- 
laboration network (circles), for the simulations of the origi- 
nal version of the model (dashed line), and for the model with 
aging (solid line). The main plot is for the actor co-starring 
network and the inset for the scientific collaboration network. 
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TABLE I: Comparison of calculations with empirical data for the movie, scientific collaboration, and codirectorship networks, 
and the simulations of the model. 





movie actors 


analytic 


numeric 


numeric 


coauthors 


analytic 


numeric 


numeric 


directors 


analytic 


numeric 




network 


results 


results 


with aging 


network 


results 


results 


with aging 


network 


results 


results 


t 


118477 




10 5 


10 5 


17828 




10 s 


10 5 


914 




10 5 


N 


382219 








16258 








7673 






fl 


12.33 


12.33 


12.33 


12.33 


3.05 


3.05 


3.05 


3.05 


11.5 


11.5 


11.5 


fit 




3.23 


3.23 


3.23 




0.91 


0.91 


0.91 




8.39 


8.39 


<k 


~ 10 2 






10 2 


~ 15 






15 


~ 10 






(?) 


3.82 


3.82 


4.44 


4.39 


3.35 


3.35 


3.70 


3.69 




1.37 


1.48 


(k) 


78.69 


43.25 


75.05 


85.67 


5.85 


6.87 


8.45 


8.93 


14.44 


14.39 


17.10 


7 




2.35 








2.43 








4.70 




(c) 


0.78 


0.71 


0.76 


0.70 


0.64 


0.68 


0.50 


0.43 


0.88 


0.87 


0.86 


c 


0.17 


0.06" 


0.037 


0.08 


0.36 


0.08 6 


0.026 


0.09 


0.59 


0.5 


0.32 


r 


0.23 




-0.13 


0.14 


0.31 




-0.08 


0.40 


0.28 




0.11 



"The cutoff k c ~ 10 3 was used. 
6 The cutoff k c ~ 10 2 was used. 




k 

FIG. 7: Comparison among the average degree of the nearest 
neighbors as a function of the degree k nn (k) for the empirical 
movie actors collaboration network (circles), the simulations 
of the original version of the model (dashed line) , and for the 
model with aging (solid line). The main plot is for the actor 
co-starring network and the inset for the scientific collabora- 
tion network. 



scientific collaboration network too) . The Internet Movie 
Database site, from which the actor collaboration net- 
work was extracted, contains information spanning the 
whole century of the history of cinema, from Louis Lu- 
miere to the most recent Hollywood productions. Con- 
sidering that actors have a finite professional life time, 
it is unrealistic to allow them to take part in a movie 
irrespective of their age. 

If we take into account this fact, two main conse- 
quences are immediately expected. On the one side, there 
should be an upper cutoff in the Q(q) distribution cor- 



responding to the professional life expectation of actors, 
as actually it is found in empirical distributions, and, in 
addition, not all actors may work together: only those 
who are contemporaneous. Obviously, this phenomenon 
affects much less the codirectorship network because of 
its exponential degree distribution. 

To introduce this new ingredient in the model, we must 
first assume an aging rate for individual agents. The most 
straightforward way to do so is to suppose that the time 
is directly equivalent to the experience q. Actually, in 
more realistic situations, it may happen that each agent 
has its own aging rhythm. However, the latter version of 
aging would make the model more complex. Once time 
is identified, we must consider a survival probability dis- 
tribution for agents. In parallel with biological systems, 
we will assume an almost sure survival till a certain age 
Qo and an exponential decay hereafter. The modifica- 
tion of the model then requires of two new parameters: 
the cutoff Qo and the characteristic time of the exponen- 
tial decay r. The rest of the model remains the same. 
That is, in each step a new movie is produced, m actors 
are new and the rest of them n — m are chosen at ran- 
dom with a probability proportional to their experience. 
In addition, we assume that the actors become inactive, 
i.e. they cannot be chosen again for new movies, with a 
probability given by the complementary of the survival 
distribution for their particular age q. 

We carried out simulations with the new version of the 
model. Qo was fixed at 100 for the actors network and 
at 15 for scientific collaborations to agree with the cut- 
off observed in the empirical distributions Q(q) of these 
networks. The value of the other parameter, r, is not 
so easy to establish from phenomenological data, there- 
fore we check several characteristic times. For the sake 
of concreteness, let us focus on the results obtained with 
t = 50 for actor co-starring and with r = 7 for scientific 
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FIG. 8: Average degree of the nearest neighbors as a function 
of the degree k nn (k) for an uncorrelated bipartite graph with 
the same degree distributions for both types of vertices as 
generated by our model (solid line). For comparison, the same 
quantity is displayed for the empirical actor network (circles). 



coauthorships, which are realistic values compatible with 
the final decay of the Q(q) empirical distributions. Actu- 
ally, using t two times bigger we did not observe essential 
differences in the properties of the networks. Moreover, 
a simple exponential survival probability (i.e., with the 
only parameter r) also provides similar approximate val- 
ues of the clustering coefficient and the Pearson coeffi- 
cient. However, it does not allow to satisfactory describe 
the whole degree distribution. Note that our choice of the 
aging parameters Qo and r does not actually mean fitting 
of our final results, which are the clustering and degree- 
degree correlation characteristics. Indeed, the values of 
Qo and r were chosen only to properly describe the de- 
gree distribution in the range of large degrees. 

In Fig. the local clustering is plotted as a function of 
the degree for a network with aging and for the empirical 
actor network. The dependence c(k) adjusts better to 
empirical data, and the computed clustering coefficients 
are closer to the empirical ones (see Table [|J . The im- 
provement on these coefficients is understandable. As one 
can see from our simple analytical estimations, the direct 
introduction of the cutoff in the degree distribution seri- 
ously improves the values of the clustering coefficients. 

A far more important point is that the aging changes 
the type of degree-degree correlations. In the version of 
the model with aging, the computed dependence of the 
mean degree of the nearest neighbor of a vertex on its 
degree properly describes the empirical dependence, as 
may be seen in Fig. [7| As a result, the computed val- 
ues of the Pearson correlation coefficients turns out to 
be positive (assortative mixing) and close enough to the 
empirical values (see Table QJ. One should note that in 
the framework of the configuration model of an uncor- 
related bipartite network, this agreement is impossible. 
We have checked this claim in the following way: We 



have measured the degree-degree correlations in the one- 
mode projection resulting from an uncorrelated bipartite 
graph with the same degree distributions for both types 
of vertices as generated by our model. In contrast to the 
self-organized model, see Fig. the curve k nn (k) turns 
out to be nearly flat, the Pearson coefficient being close 
to zero. This signals that the degree-degree correlations 
are practically absent in this case. 

VII. CONCLUSION 

In summary, we have studied a minimal model of evolv- 
ing, self-organizing collaboration networks. This model 
is not based on a static perspective as was the configura- 
tion model, but on a dynamical mechanism to construct 
the network. Besides, its basic constituents are preferen- 
tial attachment and the bipartite structure of social net- 
works. Our results show that the self-organized model 
offers a good starting point to explain existing empirical 
data. The model was compared with empirical results for 
a number of real networks, namely a network of scientific 
coauthorships, a network of movie actor collaborations 
and a network of company codirectorships. 

We have shown that, apart of a generic bipartite struc- 
ture and the growth factor, one more element has to be 
taken into account in order to explain the empirical ob- 
servations on the clustering and degree-degree correla- 
tions in collaboration networks. This key factor is the 
aging of collaborators. We demonstrate that in collabora- 
tion networks this effect is responsible for the positive (as- 
sortative) degree-degree correlations. We conclude that 
assortative mixing, which is generally observed in collab- 
oration networks, is produced by the combination of their 
bipartite structure and the aging of the collaborators. 

One should note that, in principle, even uncorre- 
lated bipartite graphs (the configuration model) have 
correlated one-mode projections. However, the specific 
degree-degree correlations in these projections are quite 
weak. In other words, the configuration model graphs 
with degree distributions typical for movie actor nets 
show neither assortative nor disassortative mixing (they 
have r « 0). In contrast, our self-organized model pro- 
vides correlated bipartite graphs, which, under natural 
assumptions, have one-mode projections with realistic 
structure and realistic correlations. 
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